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Abstract.  We  describe  a  direct  analog  implementation  of  a  neural  network  model  of  olfactory  processing  [44-48]. 
This  model  has  been  shown  capable  of  performing  hierarchical  clustering  as  a  result  of  a  coactivity-based  unsuper¬ 
vised  learning  rule  which  is  modeled  after  long-term  synaptic  potentiation.  Network  function  is  statistically  based 
and  does  not  require  highly  precise  weights  or  other  components.  We  present  current-mode  circuit  designs  to  im¬ 
plement  the  required  functions  in  CMOS  integrated  circuitry,  and  propose  the  use  of  floating-gate  MOS  transistors 
for  modifiable,  nonvolatile  interconnection  weights.  Methods  for  arrangement  of  these  weights  into  a  sparse  pseudo¬ 
random  interconnection  matrix,  and  for  parallel  implementation  of  the  learning  rule,  are  described.  Test  results 
from  functional  blocks  on  first  silicon  are  presented.  It  is  estimated  that  a  network  with  upwards  of  50K  weights 
and  with  submicrosecond  settling  times  could  be  built  with  a  conventional  CMOS  double-poly  process  and  die  size. 


1.  Introduction 

In  recent  years,  interest  in  neural  networks  and  neural- 
network-like  computational  models  has  seen  a  major 
resurgence,  due  at  least  in  part  to  the  prospect  of  com¬ 
pact  and  dense  implementation  of  these  networks  in 
analog  integrated  circuit  form.  A  number  of  widely 
studied  architectures  and  algorithms  are  based  on  adap¬ 
tations  of  conventional  statistical  and  numerical  tech¬ 
niques  which  admit  parallel  network  implementations 
(e.g.,  multilayer  perceptions  with  back-propagation 
learning  [1],  learning  vector  quantization  [2],  and  radial 
basis  function  or  probabilistic  neural  networks  [3,  4]), 
or  on  analogy  with  physical  systems  (e.g. ,  Hopfield  net¬ 
works  [5]  and  Boltzmann  machines  [6]).  These  might 
be  properly  termed  artificial  neural  network  algorithms, 
with  emphasis  on  the  artificiality,  since  resemblance  to 
real  neural  networks  (beyond  the  parallel  structure  of 
interconnected  processing  units)  is  likely  to  be  either 
superficial  or  coincidental.  These  algorithms  have  been 
applied  with  some  success  to  a  number  of  problems, 
although  studies  of  them  have  been  conducted  almost 
exclusively  in  simulations.  Much  debate  has  centered  on 
the  relative  advantages,  and  even  feasibility,  of  analog 
versus  digital  implementations  [7,  8],  With  the  architec¬ 
tures  and  algorithms  that  are  commonly  reported,  the 
precision  with  which  interconnection  weights  can  be 
represented  and  the  resolution  of  weight  changes  during 


learning  are  important  issues  in  both  the  digital  and 
analog  cases. 

Elucidation  of  the  computational  principles  used  in 
real  nervous  systems,  on  the  other  hand,  has  been  very 
limited  due  to  the  extreme  experimental  difficulties  en¬ 
countered  in  network  neuroscience.  Understanding  of 
collective  function  of  neural  networks  in  vertebrates  is 
largely  limited  to  sensory  structures  and  early  process¬ 
ing,  which  have  been  studied  in  the  greatest  depth  and 
with  the  most  success;  even  in  these  cases,  interpreta¬ 
tion  of  the  computational  principles  which  are  followed 
is  a  matter  of  current  research  [9-11]. 

A  number  of  the  direct  analog  implementations  of 
neural  networks  that  have  been  reported  to  date  consist 
of  building  blocks  that  are  suitable  for  the  artificial  par¬ 
adigms;  the  layered  heavily  interconnected  feedforward 
architecture  epitomized  by  the  multilayer  perceptron 
[12- 1 8]  or  the  reciprocally  and  symmetrically  intercon¬ 
nected  architecture  described  by  Hopfield  [5]  and  Cohen 
and  Grossbeig  [19]  are  often  targeted  [20-22],  By  way 
of  contrast,  some  researchers,  most  notably  Mead  and 
co-workers,  have  attempted  to  build  reasonably  faithful 
analogs  of  biological  neurons  or  networks  ]23-29], 
which  are  generally  early  processing  structures  for  sen¬ 
sory  input.  Mueller  and  co-workers  have  reported  an 
intermediate  approach  with  a  chipset  retaining  some 
notable  features  of  biological  neurons  but  allowing  pro¬ 
grammable  interconnection  into  general  networks  [30] . 
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An  outstanding  problem  in  analog  networks  is  the 
practical  implementation  of  learning,  which  in  the 
neural  network  field  usually  comprises  some  algo¬ 
rithmic  procedure  for  modification  of  interconnection 
weights  between  neuronal  analogs  in  response  to  stimuli 
and  possibly  desired  response  or  other  feedback  pre¬ 
sented  to  the  network.  Few  implementations  reported 
to  date  actually  include  learning  of  this  kind  on  chip 
[17,  20,  22],  Implementations  of  biologically  inspired 
networks  are  often  hardwired  [24-26],  although  a  few 
models  with  limited  adaptive  capabilities  have  been 
built  [27,  28],  A  central  research  issue  for  imple¬ 
mentation  of  the  artificial  learning  paradigms  is  the 
precision  with  which  weight  or  other  parameter  changes 
may  be  calculated  (dependent  upon  precision  of  com¬ 
ponents  such  as  weight  circuits)  and  imposed.  A  suit¬ 
able  analog  medium  for  long-term  storage  of  weights 
or  other  parameters  is  also  a  matter  of  current  research; 
floating-gate  MOS  or  MNOS  devices  have  been  pro¬ 
posed  for  this  purpose,  and  studied  by  a  number  of 
workers  [12,  27,  31-36],  The  potential  due  to  the 
charge  stored  on  such  a  structure  could  be  used  to  con¬ 
trol  the  conductance  of  a  transistor  or  transistors  in  a 
circuit  performing  the  weighting  function.  However,  the 
processes  by  which  the  stored  charge  may  be  altered 
require  either  UV  irradiation,  or  high  programming 
voltages  to  induce  Fdwler-Nordheim  tunneling  or  hot- 
carrier  injection.  In  the  latter  cases  particularly,  the 
charging  phenomena  are  very  nonlinear  and  sensitive 
to  geometries  and  processing  parameters  [37],  and  thus 
it  is  difficult  to  conceive  of  precise  modification  of 
analog  weights  without  some  kind  of  local  closed-loop 
control.  A  few  workers  have  proposed  modifications 
of  established  algorithms,  such  as  very  coarse  quan¬ 
tization  of  weight  updates  [38,  39],  which  circumvent 
the  need  for  imposition  of  precise  weight  changes,  but 
the  practicability  of  implementing  even  these  learning 
rules  in  parallel  in  analog  circuitry  remains  to  be 
demonstrated. 

In  biological  neural  networks,  modulation  of  synap¬ 
tic  efficacy  has  long  been  regarded  as  a  likely  mech¬ 
anism  for  learning  and  memory  [40] ,  and  the  phenom¬ 
enon  of  long-term  potentiation  (LTP)  as  observed  in 
the  hippocampus,  limbic  system,  and  certain  cortical 
structures  is  one  candidate  for  this  type  of  mechanism 
[41-43],  Changes  in  synaptic  strength  due  to  LTP  are 
thought  to  be  rather  coarse  [43],  in  contrast  with  the 
graded  and  precise  weights  and  weight  changes  which 
are  required  by  the  artificial  paradigms.  How  a  ner¬ 
vous  system  might  work  within  such  constraints  to  per¬ 
form  useful  computation  and  to  learn  effectively  is  a 


question  whose  resolution  is  stymied  by  the  paucity  ot 
information  on  network-level  function  within  the  brain. 
However,  a  potentially  useful  model  for  olfactory  proc¬ 
essing  has  been  proposed  by  Granger,  Lynch,  and 
Ambros-lngerson  [44-48]  which  we  believe  provides 
some  preliminary  answers  to  questions  of  this  kind 
This  model  deals  with  the  interacting  structures  of  the 
olfactory  bulb  (which  receives  input  from  the  olfactory 
receptors  via  the  olfactory  nerve)  and  the  piriform  cor¬ 
tex,  as  they  appear  in  olfactory  mammals  such  as  the 
rodents  and  lagomorphs.  It  was  developed  to  study  the 
function  of  these  structures  based  on  their  known  anat¬ 
omy  and  physiology,  and  its  emergent  computational 
properties,  rather  than  appearing  by  design,  were  dis¬ 
covered  upon  analysis  of  simulation  results.  Function 
is  acquired  by  an  unsupervised  learning  rule,  effectively 
based  on  coactivity,  which  models  long-term  potentia¬ 
tion.  Operation  is  dependent  upon  the  statistical  prop¬ 
erties  of  large  assemblages  of  neurons  with  sparse,  com¬ 
binatorial  interconnections  and  coarse-valued  weights. 

In  this  paper,  we  discuss  this  model  and  the  features 
which  make  it  amenable  to  implementation,  and  we 
describe  ongoing  efforts  toward  such  an  implementa¬ 
tion  in  analog  CMOS  integrated  circuitry.  The  low- 
resolution  weights  and  coarse,  unidirectional  weight 
changes  allow  a  parallel  implementation  of  the  learn¬ 
ing  rule,  using  floating  gates  for  nonvolatile  analog 
weight  storage.  Designs  of  test  circuits  for  macrocells 
which  implement  the  required  functions  are  presented, 
and  the  integration  of  these  macrocells  into  a  complete 
network  is  discussed. 


2.  The  Model 

The  interested  reader  is  referred  to  the  work  of  Granger 
et  al.  for  details  of  the  olfactory  model  [44-48],  The 
essential  features  of  the  model  which  are  relevant  to 
the  proposed  implementation  are  summarized  as  fol¬ 
lows.  The  olfactory  bulb  receives  input  from  the  olfac¬ 
tory  receptor  neurons  in  a  somewhat  topographic 
fashion:  a  particular  type  of  receptor  cell  (i.e. ,  a  recep 
tor  which  responds  to  particular  chemical  stimuli)  pro¬ 
jects  its  axons  along  with  those  of  similar  cells  to  a 
delimited  area  of  the  olfactory  bulb  which  is  denoted 
a  glomerulus.  The  aggregate  firing  rate  of  these  input 
cells  is  regarded  as  the  input  to  the  corresponding 
glomerulus.  There  are  many  glomeruli  in  the  olfactory 
bulb,  each  associated  with  a  different  type  of  receptor 
cell,  and  thus  the  system  input  collectively  may  be 
regarded  as  a  vector.  The  input  components,  which  are 
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excitatory,  are  first  combined  with  inhibitory  feedback 
signals  to  be  discussed  below.  The  resulting  net  inputs 
are  subject  to  nonlinear  processing  (saturating  low  and 
high)  as  well  as  a  global  normalization,  mediated  by 
certain  inhibitory  cells,  which  limits  total  bulb  activ¬ 
ity.  The  mitral  cells,  or  excitatory  neurons  within  the 
olfactory  bulb,  are  regarded  as  two-state  or  McCulloch- 
Pitts  neurons,  which  are  either  quiescent  or  active. 
Those  within  each  glomerulus  have  a  range  of  differ¬ 
ing  excitation  thresholds  at  which  they  become  active. 
The  normalization  contrains  the  bulb  so  that  only  some 
fraction  (on  the  order  of  20%  or  so)  of  all  mitral  cells 
do  in  fact  become  active  upon  stimulation.  The  net  ef¬ 
fect  of  the  processing  within  the  glomeruli  is  thus  as 
follows:  the  most  significant  components  of  the  net  in¬ 
put  vector  are  accentuated  while  many  others  are  sup¬ 
pressed  by  the  constraint  on  total  activity,  and  the  out¬ 
put  of  each  glomerulus  is  a  “thermometer-coded”  ver- 
‘  sion  of  this  processed  signal,  in  which  the  signal  in- 
1  tensity  is  represented  by  total  number  of  active  cells 
i  (due  to  differing  thresholds)  within  the  glomerulus. 

The  outputs  of  the  mitral  cells  then  project  to  the 
tj.  piriform  cortex  via  the  lateral  olfactory  tract  (LOT), 
f  Synpases  with  piriform  cells,  which  are  excitatory,  are 
[:  sparse  and  combinatorial  rather  than  topographic:  they 
appear  to  be  made  essentially  at  random,  with  a  rela¬ 
tively  low  probability  (on  the  order  of  10%).  (Piriform 
cells  in  the  caudal  region  of  the  piriform  cortex  also 
receive  excitatory  inputs  from  cells  in  the  rostral  piri¬ 
form  via  associational  fibers,  although  this  feature  will 
not  be  discussed  in  any  detail  in  this  paper.)  The  excit- 
:  itory  piriform  cells  are  arranged  in  groups  or  patches, 
which  are  defined  by  strong  local  inhibition  that  results 
in  a  “winner-take-all”  characteristic:  only  one  or  a  few 
of  the  most  strongly  stimulated  cells  within  each  patch 
reach  an  active  state  at  any  one  time.  These  cells  are 
tlso  modeled  as  two-state  devices.  The  sparse  pattern  of 


^  fj,  winning  cells  within  the  patches  is  regarded  as  the  spa- 
ally  encoded  output  of  the  olfactory  bulf/piriform  sys- 
em;  these  active  cells  are  those  which  happen  to  receive 
i  relatively  large  number  of  their  synapses  from  active 
nitral  cells.  After  a  burst  of  activity,  piriform  cells 
mdergo  afterhyperpolarization,  which  results  in  a  re- 
ractory  period  of  negligible  or  very  reduced  excitability. 

The  active  piriform  cells  in  turn  inhibit  the  glomeruli 
i  the  bulb  via  another  pathway  (this  is  the  feedback  in- 
ibition  which  is  summed  with  glomerular  inputs).  The 
hibition  is  effected  by  means  of  synapses  which  develop 
cording  to  a  correlational  or  Hebb-type  learning  rule, 
suiting  in  strongest  inhibition  of  those  glomeruli  most 
sponsible  for  the  firing  of  “winning"  piriform  cells. 


The  reciprocal  process  of  feedforward  excitation  of 
the  piriform  by  the  olfactory  bulb  followed  by  feedback 
inhibition  of  the  bulb  by  the  piriform  is  repeated  cyclic¬ 
ally  at  the  so-called  theta  rhythm,  to  which  activity  in 
this  part  of  the  brain,  as  well  as  the  animal’s  sniffing 
behavior,  is  synchronized.  Feedback  inhibition  of  the 
bulb  during  this  multiple  sampling  cumulative.  Thus, 
as  the  animal  sniffs  a  single  odor,  the  following  se¬ 
quence  takes  place  in  the  naive  network:  after  the  first 
sniff,  the  glomeruli  with  the  most  significant  input  com¬ 
ponents  are  most  strongly  inhibited,  allowing  second¬ 
ary  components  to  elicit  more  significant  responses 
from  their  glomeruli  during  the  next  sniff.  In  subse¬ 
quent  sniffs,  these  components  are  also  inhibited  allow¬ 
ing  still  weaker  components  to  be  expressed,  and  so 
on  in  a  hierarchical  fashion.  At  each  step  in  this  hier¬ 
archy,  a  novel  piriform  output  code  is  guaranteed  by 
the  refractory  state  of  previously  active  piriform  cells. 

Learning  in  this  system,  which  is  modeled  after 
long-term  potentiation,  is  coactivity-based:  the  weights 
of  excitatory  synapses  from  active  mitral  cells  onto 
“winning”  piriform  cells  are  incremented.  Learning 
is  mediated  by  external  inputs  from  higher  cortical 
regions  (i.e.,  it  can  be  turned  on  or  off).  Weights  can 
saturate;  when  fully  potentiated  they  are  larger  than 
naive  weights  by  a  factor  of  only  two  to  three.  Learn¬ 
ing  increments  are  of  constant  magnitude  and  typically 
represent  5%-10%  of  the  range  between  naive  and  fully 
potentiated  weights.  LTP,  as  the  name  implies,  is  a  long- 
lasting  phenomenon  in  which  measurable  weight  decay 
is  not  observed. 

The  effect  of  learning  in  this  model  is  that  the  net¬ 
work  develops  a  tendency  to  cluster  its  input  vectors, 
the  output  codes  for  vectors  sufficiently  close  in  the 
input  space  become  very  similar  or  identical,  as  the 
weights  associated  with  piriform  cells  that  have  “won" 
most  frequently  become  larger.  Moreoever,  the  feed¬ 
back  from  piriform  to  bulb  then  tends  to  inhibit  the 
glomeruli  not  simply  in  proportion  to  their  activity,  but 
rather  in  relation  to  the  expected  activity  for  the  cluster 
mean.  Thus,  not  only  are  glomeruli  with  significant 
input  components  suppressed,  but  in  addition,  dif¬ 
ferences  between  the  input  vector  and  the  cluster  mean 
tend  to  be  accentuated.  The  net  result  is  that,  during 
the  multisampling  process,  a  hierarchical  clustering 
takes  place,  in  which  initial  output  codes  indicate  broad 
class  or  cluster  membership,  and  subsequent  codes, 
subcluster  or  narrower  class  membership.  Cluster  and 
subclustcr  breadth  in  the  input  vector  space  are  in¬ 
fluenced  by  the  weight  increment  size,  the  ratio  of 
saturated  to  naive  weight  values,  and  the  data  sample 
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on  which  the  network  learns.  The  essential  features  ol 
this  model  have  been  abstracted  and  embedded  in  a 
somewhat  simplified  version,  whose  resemblance  to 
several  other  unsupervised  clustering  algorithms  has 
been  noted  [45,  46 1. 

A  number  of  features  of  this  model  are  particularly 
favorable  for  simple  direct  implementation.  The  neuron 
models  are  two-state  devices,  and  consequently,  four- 
quadrant  multipliers  are  not  required  to  implement  the 
interconnection  weights;  in  fact,  single  transistors  suf¬ 
fice.  However,  most  crucially,  the  weights  require  only 
low  precision,  on  the  order  of  3-5  bits,  and  learning 
in  the  network  comprises  coarse,  unidirectional  weight 
changes  which  take  place  according  to  a  simple  Hebb- 
type  or  coactivity-based  update  rule.  Weights  saturate 
as  well,  and  this  is  a  natural  feature  to  be  expected  of 
any  analog  storage  medium. 

3  Implementation 

We  propose  a  direct  implementation  of  this  algorithm 
in  the  form  of  a  synchronous,  analog  silicon  model  in 
CMOS  circuitry.  The  importance  of  the  theta  rhythm 
for  the  network  function  of  hierarchical  clustering  sug¬ 
gests  the  suitability  of  an  approach  which  is  syn¬ 
chronous  or  clocked  at  the  highest  level  of  function. 
External  inputs  (analogous  to  inputs  from  olfactory 
receptors)  would  be  sampled  periodically  at  an  artificial 
“theta  rhythm.”  For  each  cycle  of  this  rhythm,  there 
would  be  two  major  phases:  activation  of  the  bulb  and 
feedforward  excitation  of  the  piriform,  followed  by  feed¬ 
back  inhibition  of  the  bulb  by  the  piriform.  Between 
clock  cycles,  however,  computation  of  neuronal  inputs 
and  activitations  would  be  analog,  asynchronous,  and 
carried  out  in  parallel.  We  also  propose  to  implement 
network  learning,  with  modifiable  nonvolatile  weights 
which  are  updated  in  parallel  according  to  the  Granger/ 
Lynch/Ambros-Ingerson  model  when  network  plasticity 
is  desired.  Below  we  discuss  the  general  approach,  and 
then  present  circuits  designed  to  implement  the  requisite 
functions. 


3.1  General  Approach  and  Architecture 

Following  the  Granger/Lynch/Ambros-lngerson  model, 
neuronal  analogs  in  both  the  bulb  and  piriform  layers 
are  two-state  devices.  In  the  bulb,  net  inputs  to  the 
glomeruli  are  formed  by  combining  positive  external 
input  signals  with  (negative)  inhibitory  feedback,  and 


these  net  inputs  are  then  subject  nonlinear  process 
and  normalization.  Within  the  framework  suggested 
the  biological  model,  we  have  developed  a  pair  of  alt 
natives  for  this  processing/normalization  which  are  i 
plementable  with  closed- loop  circuits  similar  to  th< 
used  in  automatic  gain  control  (AGC).  One  most  clos< 
follows  the  form  given  by  Ambros-Ingerson  [45|,  c< 
sisting  of  a  vector  AGC  loop  with  sigmoidal  nonline. 
ity  acting  on  each  component  within  the  loop,  as  illi 
trated  in  figure  la.  A  second  includes  an  AGC  lo 
without  the  sigmoids,  but  with  a  global  offset  add 
to  each  component  within  the  loop  such  that  the  largi 
net  input  elicits  maximal  activity  from  its  glomeruli 
This  offset  is  computed  by  a  fast  inner  loop,  as  sho\ 
in  figure  lb.  The  second  scheme  may  offer  some  repi 
sentational  advantages,  but  the  relative  applicability 
the  two  approaches  is  currently  under  investigation 
system-level  simulations. 

Subsequent  to  this  normalization,  the  processt 
signals  are  thermometer-coded  by  the  two-state  mitr 
neuron  models  in  each  glomerulus.  Individual  mitr 
cell  analogs  respond  with  a  binary  output,  indicatir 
active  or  inactive. 

In  the  piriform  model,  subnetworks  of  neuron 
analogs  are  arranged  in  winner-take-all  patches,  eat 
operating  with  a  single  global  feedback  line  to  achie' 
patchwide  inhibition  of  “losing”  cells.  Global  feedbac 
implies  that  an  /V-cell  patch  would  be  implementabl 
with  complexity  of  order  (N).  Such  feedback  network 
have  been  described  by  Lazzaro  et  al.  [491. 

For  “synaptic”  weights,  we  propose  the  use  of  analo 
floating-gate  memory  in  conjunction  with  a  single  Iran? 
istor  weighting  element  whose  conductance  is  modu 
fated  by  charge  on  the  floating  gate.  Because  10  or  fewe 
distinct  synaptic  strengths  are  required  for  the  LCT 
synapses  in  the  Granger/Lynch/Ambros-lngerson  mode 
[44-48],  analog  floating  gates  would  seem  to  pos 
little  risk.  Long-term  (decades)  retention  of  at  least  • 
bits  of  resolution  has  been  estimated  by  extrapolatioi 
from  high-temperature  charge-relaxation  data  on  float 
ing-gate  circuits  used  in  an  analog  neural  network  im 
plementation  [12J. 

In  the  model,  the  synapses  from  mitral  cells  onti 
piriform  cells  form  a  sparse,  random  interconnection 
matrix.  The  approach  which  we  propose  to  implemeni 
this  matrix  employs  a  simple  one-to-one  correspond 
ence  of  the  number  of  weighting  elements  to  number 
of  synapses  in  the  model,  with  mask-programmable 
connection  of  input  and  output  lines  allowing  establish 
ment  of  the  sparse  pseudorandom  connectivity  The 
physical  weight  matrix  is  composed  of  cells  containing 
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y  1  Schematic  diagrams  for  normalization  of  net  input  vectors  to  hierarchical  clustering  network.  C‘  represent  net  input  components  and 
'■  normalized  input  components,  (a)  Scheme  which  closely  follows  the  original  biological  model,  with  sigmoidal  nonlinearity  blocks  included 
i  the  feedback  loop.  Ka  is  a  reference  level  corresponding  to  desired  total  activation,  (b)  Scheme  which  insures  that  the  largest  net  input 
omponent  elicits  a  full-scale  response.  FS  is  a  reference  level  corresponding  to  full-scale  activation.  Normalized  output  components  are  assumed 
)  saturate  low  at  zero. 


[ifc  I 


oe  or  more  weighting  transistors  and  the  crossing  of 
everal  mitral  output  and  piriform  input  lines  and  inter- 
onnections  arc  established  at  random  between  pairs 
jj'f  input  and  output  lines  within  each  cell.  We  consider 
I  prototype  for  this  concept  in  which  a  basic  weight 
•|il  contains  two  weighting  transistors  and  the  crossings 


of  four  mitral  output  lines  and  five  piriform  input  lines. 
Any  input  line  may  be  interconnected  with  any  output 
line,  with  the  caveat  that  double  interconnection  be¬ 
tween  a  given  pair  of  lines  is  excluded;  a  connectivity 
ratio  of  I ;  10  is  thus  maintained  by  the  use  of  this  cell . 
The  connections  are  established  at  layout  time  by  a 


39 


302  Shoemaker,  Hutchens  and  Patil 

macro  which  generates  a  randomized  list  and  then  places 
geometries  on  the  appropriate  mask  layer(s)  to  establish 
the  interconnections  in  the  layout  database.  The  objec¬ 
tive  of  this  approach  is  to  minimize  interconnect  and 
routing  area  and  conserve  the  number  of  devices  re¬ 
quired  in  the  interconnection  matrix,  which  factors  are 
of  concern  (7,  50]  in  a  direct,  nonmultiplexed  im¬ 
plementation.  Assuming  scalable  design  rules,  we 
estimate  the  area  required  for  this  scheme  is  on  the 
order  of  one-fifth  to  one-tenth  the  area  estimate  given 
by  Hammers trom  and  Means  [50]  for  direct  implemen¬ 
tation,  and  which  is  cited  by  them  as  a  motivating  fac¬ 
tor  for  development  of  a  broadcast  multiplexed  digital 
architecture  as  an  alternative  to  the  direct  analog 
approach. 

The  price  paid  for  the  simplicity  of  the  proposed 
architecture  is  the  forfeiture  of  a  certain  degree  of 
statistical  independence  of  the  connectivity.  For  exam¬ 
ple,  three  particular  LOT  lines  which  pass  through  the 
same  basic  weight  cell  have  zero  probability  of  synap- 
sing  onto  the  same  piriform  input  line,  and  three  piri¬ 
form  lines  passing  through  die  cell  have  zero  probability 
of  receiving  synaptic  input  from  the  same  LOT  line. 
Without  the  constraint  imposed  by  the  weight  cell,  the 
probability  of  either  of  these  events  is  (1/10)3  or 
1/1000.  However,  as  a  consequence  of  the  central  limit 
theorem,  the  distribution  of  active  synapses  onto  the 
piriform  input  lines  becomes  similar  to  that  of  the  un¬ 
constrained  interconnection  pattern  of  the  original 
model  as  the  number  of  LOT  lines  increases.  We  have 
calculated  both  distributions  for  LOTs  of  several  hun¬ 
dred  lines  and  mitral  activity  of  20% ,  and  they  are  very 
similar;  thus  use  of  the  weight  cell  is  not  regarded  as 
an  important  constraint  in  networks  which  are  suffi¬ 
ciently  large,  but  still  of  realizable  size. 

To  :  nplement  feedback  inhibition  of  the  bulb  by  the 
piriform,  we  propose  a  time-duplex  scheme.  The 
original  algorithm  call  for  distinct  feedback  paths  from 
piriform  to  bulb,  with  inhibitory  synapses  trained  ac¬ 
cording  to  a  correlative  or  Hebb-type  learning  rule  in 
a  developmental  phase  prior  to  the  application  of  struc¬ 
tured  input.  However,  since  these  correlations  arise  in 
direct  consequence  of  the  given  connectivity  of  the  LOT 
synapses,  the  same  effect  can  be  obtained  by  using  the 
transpose  of  the  LOT  weight  matrix  to  compute  bulbar 
inhibition.  Physically,  this  implies  that  a  single  weight 
matrix  can  be  used  to  compute  excitatory  bulbar  input 
to  piriform,  followed  by  inhibitary  currents  from  piri¬ 
form  feedback  to  bulb.  In  the  second  phase,  winning 
piriform  cells  would  drive  the  weight  matrix,  and  the 
output  currents  would  be  summed  over  each  glomerulus 
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on  the  bulb  side  to  obtain  the  inhibition  for  that  sam| 
or  “sniff.” 

For  individual  weights,  the  control  logic  for  t 
coactivity-based  learning  rule  corresponds  to  a  simj 
AND  function;  taken  in  parallel  it  may  be  regarded 
a  Boolean  outer  product.  This  can  be  implemented 
crossbars  running  through  the  weight  matrix  using  sii 
pie  switches  which  are  controlled  by  the  neuron  stai 
and  which  route  programming  voltages  to  writing  c 
cuitry  for  the  floating-gate  weights. 

A  block  diagram  representing  an  overview  of  i 
proposed  system  is  shown  in  figure  2. 

3.2.  Circuit  Designs 

Many  of  the  functions  which  are  required  to  implemi 
the  model  as  described  above  may  be  achieved  w 
well-known  analog  building  blocks.  In  designing  c 
circuitry,  a  current-mode  approach  was  adopted  1 
reasons  of  improved  bandwidth  and  noise  immuni 
(Voltage-mode  signals  are  assumed  at  network  inpi 
and  outputs,  however,  for  convenience  of  external  int 
face.)  A  settling  time  on  the  order  of  several  hundr 
nanoseconds  was  targeted  for  feedforward  excitatory 
feedback  inhibitory  phases  of  network  operatic 
Current-mode  circuits  in  addition  permit  a  simple  so 
tion  to  the  proposed  bidirectional,  time-multiplexed  i 
of  the  weight  matrix.  Interface  is  made  to  the  weij 
matrix  on  both  the  mitral  and  piriform  sides  via  tyj 
two  current  conveyors  (CCH)  [51],  which  act  as  bidin 
tional  buffer/drivers.  In  the  CCII  design  shown  in  figr 
3,  a  folded-cascode  differential  amplifier  is  used  a: 
gain  element  for  wide  bandwidth.  Its  positive  in; 
serves  as  the  reference  (Y)  terminal  of  the  convey 
a  class  AB  output  stage  (MFN  and  MFP)  coupled 
the  negative  input  forms  the  voltage-following  ( X )  t 
minal,  and  the  current  output  of  this  stage  is  in  tv 
copied  to  give  the  current  (Z)  output  of  the  convey 

Two  options  for  the  initial  processing  and  normali. 
tion  of  input  vector  are  shown  schematically  in  figui 
la  and  lb,  as  noted  in  Section  3.1;  we  describe  th< 
salient  components  below.  For  multiplication  by  t 
global  gain  in  the  AGC  loop,  both  simple  voltaj 
controlled  active  loads  and  a  more  complex  transcc 
ductance  multiplier  for  improved  linearity  are  unc1 
consideration.  The  transconductance  multiplier  if 
modified  dual-quad  circuit.  The  sigmoid  nonlinear 
of  the  first  preprocessing  option  is  imposed  by  the  c 
cuit  shown  in  figure  4,  in  which  the  basis  0G  sets  t 
threshold  and  Vc  sets  saturation.  The  input  load  is 
practice  a  complementary  series  pair  of  MOSFE 
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fig.  2.  Overview  of  proposed  system.  Integer  g  indicates  number  of  input  components  (and  bulb  glomeruli),  m  indicates  number  of  levels 
in  the  thermometer  coding  of  net  inputs,  p  indicates  the  number  of  winner-take-all  piriform  patches,  and  h  indicates  the  number  of  cells  per 
h.  Oj  are  external  inputs,  G,-  are  net  inputs,  G,-  are  normalized  inputs  (t  =  I,  . .  . ,  g),  ij  are  feedback  inhibition  components  (j  =  1, 

m*g),  and  /,  are  accumulated  inhibition  for  each  glomerulus  (i  =  1 . g).  LOT  indicates  the  lateral  olfactory  tract  analog,  WT the 

ansposable  weight  matrix,  and  WTA  winner-take-all. 


t  3.  Schematic  of  type-two  current  conveyor  (CCII). 


Fig.  4.  Sigmoidal  nonJinerity  circuit  with  current-mode  output.  The 
biases  0C  and  Vc  control  threshold  and  saturation  characteristics  of 
the  function,  respectively. 
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strongly  biased  in  the  triode  region.  The  four  cross- 
coupled  n-channel  transistors,  M  l-A/4,  when  in  satura¬ 
tion,  impress  the  input  voltage  less  the  bias  0G  across 
nonlinear  (saturating)  load  A/ 9,  and  the  current  through 
A/9  is  copied  to  provide  the  output  of  the  circuit.  In 
the  second  option,  the  offset  needed  to  elicit  a  full-scale 
response  to  largest  input  component  is  computed  by  a 
fast  inner  closed-loop  circuit  as  depicted  in  figure  lb, 
in  which  the  output  of  a  max' mum  detection  circuit  (not 
depicted)  is  compared  against  a  full-scale  reference.  As 
a  gain  element  in  these  loops,  the  folded-cascode  dif¬ 
ferential  amplifier  embedded  in  the  CCII  circuit  of 
figure  3  may  be  used  with  the  two  output  terminals 
connected. 

The  thermometer-coding  function  of  each  glomer¬ 
ulus  is  achieved  with  a  circuit  analogous  to  the  first 
stage  of  a  parallel  analog-to-digital  converter,  as  illus¬ 
trated  in  figure  5.  A  voltage  ladder  is  established  by 
a  series  of  identical  capacitors.  Full-scale  voltage  is  set 
globally  by  equilibrating  full-scale  input  current  across 
a  load  (again  composed  of  active  devices  biased  strong¬ 
ly  in  the  triode  region).  In  a  VLSI  network,  the  full- 
scale  current  could  be  copied  and  routed  to  loads  in 
each  glomerulus  to  maintain  accuracy.  The  preproc¬ 
essed  input  current  for  each  glomerulus  is  equilibrated 
across  an  identical  load  and  the  resulting  voltage  com¬ 
pared  against  each  step  of  the  voltage  ladder  by  a  series 
of  comparators,  whose  outputs  represent  the  states  of 
the  mitral  cells  within  the  glomerulus. 


Fig  5  Thermometer  coding  circuit  /R  is  a  reference  current  i  or 
responding  to  full-scale  input,  and  /,„  is  the  input  current  The  two 
load  resistances  are  composed  of  active  devices  in  practice,  and  are 
identical  LOT1-LOTI6  are  comparators  whose  outputs  consistute 
the  thermometer  encoding  of  the  input 


When  the  network  is  in  the  feedforward  mode,  ti 
reference  (Y)  input  of  the  current  conveyors  for  acti 
mitral  cells  are  switched  to  ground  while  others  a 
switched  to  a  high  reference.  On  the  piriform  side.  < 
reference  inputs  are  switched  to  the  high  reference.  TI 
X  terminal  voltage  follows  the  Y  input  per  normal  CC 
operation. 

The  weighting  elements  in  the  weight  matrix  eai 
comprise  an  individual  floating-gate  p-channel  transi 
tor.  The  floating  gate  on  the  first  polysilicon  layer 
capacitively  coupled  to  a  “control  gate”  on  the  secoi 
polysilicon  layer,  and  the  bias  applied  to  the  poly-2  co 
trol  gate  is  used  to  establish  the  transconductance  corr 
sponding  to  the  naive  weight,  when  the  floating  gate 
uncharged.  The  bias  capacitor  is  also  used  to  apply 
programming  voltage  during  learning,  to  be  discussi 
below.  Negative  charge  on  the  floating  gate  increasi 
the  transistor  transconductance  and  thus  the  weight  asst 
ciated  with  the  interconnection.  Current  flows  via  tl 
weighting  transistors  to  active  mitral  cell  conveyors  froi 
piriform  conveyors,  while  no  appreciable  current  flov 
to  inactive  mitral  cell  conveyors  from  piriform  coi 
veyors  since  both  reference  inputs  are  at  the  same  leve 

The  current  (Z)  outputs  of  the  piriform  current  coi 
veyors  are  routed  as  inputs  to  winner-take-all  circui 
which  define  the  piriform  patches.  The  winner-taki 
all  circuit  depicted  in  figure  6  operates  with  global  feec 
back  much  like  the  circuit  of  Lazarro  et  al.  [49],  bi 
is  designed  for  improved  sensitivity.  It  is  reset  at  tl 
beginning  of  each  sniff  by  transistor  A/5,  which 
distributed  in  each  of  the  piriform  cell  analogs,  an 
which  discharges  the  common  gate  of  transistors  M 
to  Vss.  WJien  A/5  is  shut  off,  this  common  gate  , 
charged  by  the  incoming  currents,  and  when  the  M 
devices  turn  on,  each  begins  to  sink  a  portion  of  th 
input  current  for  its  cell.  In  all  but  the  cell  with  tb 
maximum  input,  the  current  drawn  by  M 1  reaches  the 
exceeds  the  input  current,  and  the  difference  currei 
must  be  drawn  via  A/4.  At  this  transition,  the  voltag 
at  the  input  node  falls  from  a  threshold  above  groun 
to  a  threshold  below  The  input  node  of  the  single  wir 
ner  remains  near  one  threshold  above  ground,  with  M 
conducting  just  sufficiently  to  balance  the  leakage  cui 
rent  from  the  common  gate  of  the  M 1  transistors.  Th 
voltages  at  the  input  nodes  are  amplified  and  level 
shifted  b\  inverters  to  give  the  piriform  outputs  t> 
0-5  V  logic  Transistors  A/2  in  figure  b  are  cascode 
included  to  prevent  large  sw  ings  in  the  drain  voltage 
of  the  M  1  devices 

This  analysis  assumes  that  discharge  of  capacitance 
at  the  circuit  inputs  is  fast  relative  to  the  charging  of  th 


4? 


jiti&sjitifeUin-:-:-'; 
;!!!}:•;:■ :" 


RESET 

Fig.  &  Winner-take-all  circuit  for  piriform  patch  with  h  cells  Ip  arc  i.ipui  ,urrcm  u  1 


feedback  capacitance  at  the  gates  of  M 1.  If  this  is  not  the 
case,  then  the  gates  of  M 1  may  overcharge  and  draw  a 
|j;  current  greater  than  the  maximum  input  current  during 
|  settling,  in  which  case  all  outputs  are  pulled  low,  and 
|  remain  so  while  the  feedback  node  is  drawn  down  by 
leakage  off  the  feedback  capacitance,  until  the  M 1  cur- 
I  rents  decrease  to  the  maximum  input  and  M 3  for  the  cell 
with  maximum  input  is  forced  to  the  edge  of 
conduction. 

The  sparse  pattern  of  piriform  winners  from  these 
|  winner-take-all  circuits  constitutes  the  output  of  the  net- 
Iwork.  Time  multiplexing  and/or  digital  encoding  would 
|be  used  in  practice  to  take  this  data  off-chip,  in  order  to 
limit  pin  count.  To  ensure  a  valid  binary  code,  a  digital 
logic-based  tie-resolving  circuit  has  been  developed  to 
>btain  a  single  winner  from  the  output  of  the  analog  win- 
her-take-all  circuit.  These  circuits  are  conventional  and 
)f  secondary  concern,  and  will  not  be  considered  further. 

After  piriform  winners  are  established,  the  feedback 
|inhibitory  phase  of  the  network  operation  h’kes  place. 
fPiriform  states  are  latched,  and  the  reference  inputs  for 
the  conveyors  of  the  winning  cells  are  switched  to  high 
preference,  while  those  of  the  losers  and  of  the  conveyors 
bn  the  bulb  side  are  grounded.  The  output  currents  of 
the  conveyors  for  each  glomerulus  in  the  bulb  are 
.  ...  fummed  and  used  to  determine  level  of  inhibition. 
!!:!!;  Vi  thin  the  general  framework  of  the  biological  model, 
;i  everal  schemes  for  computation  of  inhibition  are  under 
I;  nvestigation,  ranging  from  scaling  to  thresholding  of 
ccumulated  feedback  current  before  subtraction  from 
xtemal  input  current.  To  accumulate  feedback  over  a 
sn'fis.  a  current  copier/integrator  has  been 
.  :''^::ii!!*|i||iir!l.;:i|fesigned  as  shown  schematically  in  figure  7.  The  cur- 
I  ent  copier/integrator  operates  under  control  of  a  clock 

’  nth  two  (nonoverlapping)  phases,  the  first  of  which 
I  oust  fell  within  the  feedback  phase  of  the  system  clock, 
is  reset  before  each  series  of  sniffs  by  discharging 


hold  capacitors  CH  to  or  It  includes  dynamic 
current  mirrors  to  enhance  the  accuracy  of  the  current 
copying  function. 

During  learning,  we  propose  to  exploit  simple  drain 
side  hot-electron  injection  onto  the  floating  gates  of  the 
weighting  transistors  through  a  gate  oxide  of  usual  thick¬ 
ness.  This  obviates  the  need  for  EEPROM  or  other  spe¬ 
cial  processing  to  implement  the  floating-gate  weights. 
A  scheme  for  performing  coactivity-based  updating  is 
outlined  as  follows.  For  each  mitral  output  line  in  the 
LOT,  a  corresponding  bias  line  is  fabricated  which  con¬ 
tacts  the  control  gate  of  every  weighting  transistor  con¬ 
nected  to  the  mitral  line.  During  normal  operation,  these 
bias  lines  are  all  set  at  a  common  bias  voltage  used  to 
establish  the  naive  weight  value.  When  the  weights  are 
to  be  updated,  the  bias  lines  corresponding  to  active 
mitral  cells  are  switched  to  a  high-voltage  programming 
line  via  high-voltage  switches,  while  on  the  piriform 
side,  the  reference  inputs  of  current  conveyors  for  win¬ 
ning  piriform  cells  are  strobed  to  the  negative  rail,  pull¬ 
ing  the  drains  of  the  weighting  transistors  for  those  cells 
to  nearly  the  same  potential .  It  is  assumed  that  the  ampli¬ 
tude  of  the  programming  voltage  less  the  lower  rail  is 
sufficient  to  allow  injection  of  some  appropriate  amount 
of  charge.  In  this  way,  the  weights  interconnecting  coac¬ 
tive  mitral  and  piriform  cells  are  incremented.  Mean¬ 
while,  the  reference  inputs  of  the  mitral  and  losing  pin- 
form  current  conveyors  are  maintained  at  an  intermediate 
potential  such  as  ground.  It  is  assumed  that  the  program¬ 
ming  voltage  less  the  intermediate  voltage  does  not  cause 
injection  of  significant  chatge.  In  addition,  the  bias  lines 
of  inactive  mitral  cells  are  held  at  some  potential  suf¬ 
ficiently  high  to  maintain  the  corresponding  transistors 
in  a  strongly  accumulated  state  and  prevent  significant 
channel  current  in  any  devices  connected  to  winning 
piriform  cells.  In  this  way,  the  update  rule  may  be 
implemented  in  parallel  without  drawing  large  currents. 
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Fig.  7.  Current  copier/integrator.  Iw  is  input  curreM  (summed  over  m  LOT  lines  connected  with  a  single  glomerulus),  and  t0  is  the  accumulate 
output  current.  0ji  and  ^  arc  nonoverlapping  clock  phases  which  control  opening  and  closing  of  transmission  gate  switches  depicted. 

4.  Results  absolute  error  of  5%.  Bandwidth  in  simulations  wit) 

the  output  loaded  by  a  diode-connected  MOSFET  (Wh 
Most  of  the  circuits  described  above  were  fabricated  =  5  jrni/5  /un)  is  30  MHz.  The  sigmoid  circuit  behave 
in  a  MOSIS  2  fim  analog  CMOS  process,  or  an  Orbit  qualitatively  as  expected  in  dc  tests,  with  saturation  an< 

Semiconductor  1 .5  fim  CMOS  process  intended  threshold  characteristics  controllable  by  the  two  bia 

primarily  for  digital  applications.  Both  proceses  had  voltages  Vc  and  0C,  respectively.  The  simulate* 

double-polysilicon  and  double-metal  layers.  Testing  per-  small-signal  bandwidth  varies  with  state  but  exceed 

harmed  on  these  circuits  was  generally  limited  to  dc  10  MHz  across  the  range. 

functionality  as  available  test  resources  did  not  permit  The  dc  transfer  characteristic  of  the  thermometer 

full-bandwidth  ac  or  real-time  response  testing,  due  coding  circuit  is  qualitatively  as  expected,  although  in 

primarily  to  capacitive  loading  of  input  and  output  put  capacitance  of  the  comparators  connected  to  th> 

nodes.  Consequently,  SPICE  simulation  results  are  capacitive  ladder  contributes  to  a  nonunifbrmity  in  stej 

given  to  represent  the  ac  frequency  or  transient  response  size  of  the  quantization  performed  by  the  circuit.  N< 

of  the  circuits.  Test  results  were  obtained  from  either  particular  design  measures  were  taken  against  sucl 

two  or  three  die.  variations  as  they  are  believed  to  be  of  little  significance 

In  tests  of  the  CCII  circuit,  the  X  output  follows  the  as  long  as  quantization  is  monotonic.  In  simulations 

Y  reference  from  —2.5  to  2.5  V  under  1-kfl  load,  and  the  unloaded  comparator  outputs  respond  to  a  full-scali 

the  Z  output  stage  is  capable  of  tracking  the  X  output  step  input  to  the  thermometer-coding  circuit  with  ris< 

current  from  -2  to  1.75  mA.  Simulated  unity-gain  times  in  the  range  of 20-150  ns  (see  figure  8).  Response 

bandwidth  into  the  1-kfl  load  is  in  excess  of  20  MHz.  times  are  directly  related  to  position  of  the  comparatoi 

Several  of  the  subblocks  for  the  nonlinear  normal-  reference  input  on  the  voltage  ladder,  which  determine) 

ization  circuihy  were  successfully  fabricated  and  tested.  magnitude  of  differential  drive  voltages.  The  same  time 

The  transconductance  multiplier  exhibits  an  rms  linear-  order  of  response  occurs  in  the  test  circuits  witl 

ity  error  (relative  to  full  scale)  of  1 .7  %  and  a  maximum  capacitively  loaded  outputs. 
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Fig.  &  Simulated  transient  response  of  thermometer  coding  circuit.  The  trace  labeled  Vm  is  the  input  voltage  in  response  to  the  full- scale 
step  current  input  in  the  upper  trace.  The  three  leftmost  traces  on  the  lower  graph  are  the  outputs  of  comparators  at  the  bottom  of  the  voltage 
ladder  and  the  rightmost  those  of  comparators  at  the  top  of  the  ladder. 


A  32-stage  winner-take-ali  test  circuit  was  fabricated 
tested.  It  was  found  capable  of  resolving  input  cur- 
differing  by  1-3  /iA  at  total  input  levels  of  70-140 
In  eight  tests  on  three  circuits,  the  average  resolu- 
ion  was  2.1  fiA.  As  a  design  target  a  figure  of  5  ftA 
for  the  current  output  of  a  naive  weight  has  been  used, 
average  resolution  is  to  better  than  half  the  design 
current  delivered  by  a  single  naive  weight. 

Without  added  capacitance  at  the  feedback  node,  the 
r-take-all  circuit  with  device  geometries  as  de¬ 
igned  has  been  found  to  permit  overcharging  of  the 
feedback  node  in  certain  simulated  worst-case  scenar- 
os.  An  added  capacitance  of  2  pF  was  included  in  the 
imulation  summarized  in  figure  9,  which  depicts  time 
.nurse  of  response  of  the  circuit  after  reset  in  a  near- 
*orst-case  senario  in  which  the  four  largest  input  cur- 
ents  are  nearly  equal  and  appreciably  larger  than  the 

1|thers.  The  simulation  includes  no  external  capacitance 
1 1  the  inputs  and  outputs.  Time  to  determination  of  the 
'inner  in  this  case  is  on  the  order  of  1 20  ns.  Improved 
trformance  and  elimination  of  the  added  capacitance 
*n  be  achieved  by  modification  of  the  geometries  of 
devices  in  figure  6;  in  particular,  widening  of  Ml 
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Fig.  fi  Simulated  transient  response  of  a  32-stage  winner-lake-all  cir¬ 
cuit  Twenty-eight  inputs  were  at  the  low  level  of  60  *»A.  three  were 
at  the  high  level  of  138  j»A,  and  the  winning  input  was  at  140  p A 
Examples  of  three  corresponding  outputs  are  shown.  Time  course 
of  resetting  is  indicated. 

will  increase  both  capacitance  at  the  feedback  node  and 
the  bypass  current  which  discharges  capacitance  at  the 
input  node  via  Ml  and  M2. 
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Due  to  a  design/layout  error,  the  current  copier/ 
integrator  displayed  a  large  copying  error  (about  70% 
at  40  nA)  at  the  initial  cycle  in  dc  tests.  Simulations 
indicate  the  circuit  to  be  operable  at  a  clocking  speed 
of  10  MHz. 

Floating  gate  test  circuits  were  fabricated  in  the  1 .5 
H m  digital  process  (which  had  a  gate  oxide  thickness 
of  25  nm),  and  tested  according  to  the  programming 
scheme  described  in  Section  3.2.  Programming 
voltages  of  17-19.5  V  total  amplitude  (control  gate  to 
drain)  were  used,  applied  in  pulses  of  several  durations 
and  rise  times.  Positive-going  control  gate  pulses  over¬ 
lapped  negative-going  drain  pulses  to  prevent  channel 
current  flow.  Figure  10  depicts  shift  in  transistor 
threshold  voltage  (relative  to  the  control  gate)  observed 
in  one  of  these  tests.  These  shifts  are  representative  of 
th  potential  changes  of  the  floating  gate.  Useful  shifts 
required  microseconds  or  tens  of  microseconds  of  total 


renrZi  no' 
Production 


programming  time.  Charge  relaxation  measures 
have  not  been  made,  although  measurable  charge 
does  not  occur  within  days  at  room  temperature.  Ir 
dition,  in  an  experiment  with  13  V,  l-/xs  programr 
pulses  applied  to  the  control  gate,  the  drain  terrr 
was  grounded  rather  than  pulsed  to  -5  V,  which 
null  update  state  in  the  parallel  learning  scheme, 
measurable  threshold  shift  was  obtained  after  1  ms 
programming  time. 

Several  unresolved  issues  remain  with  regard  to 
of  this  circuit  as  a  nonvolatile  programmable  wei 
One  is  the  strongly  nonlinear  dependence  of  chaigi 
jection  on  floating  gate  potential  relative  to  the  dr 
which  decreases  as  charge  builds  up  on  the  gate.  ' 
is  reflected  in  figure  10,  in  which  the  abscissa  is  pic 
log-scale.  The  relationship  does  result  in  effec 
saturation  of  the  weight  but  the  uneven  increment  s 
during  the  first  few  pulses  are  of  concern  with  rej 


O 


fig  10.  Threshold  voltage  shift  for  a  floating  gate  test  circuit  subjected  to  1-ps,  18-V  programming  pulses  with  120-ns  rise  times.  Thresh 
is  measured  relative  to  the  control  gate. 
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to  the  learning  algorithm.  Methods  of  circumventing 
this  problem  (e.g. ,  see  [52])  are  under  consideration.  In 
addition,  an  intermittent  rapid  (single  pulse)  chaige-up 
of  the  floating  gate  was  observed  in  tests  with  short  (100 
ns)  overlap  of  drain  pulses  by  control  gate  pulses  and 
total  programming  voltages  of  19  V  or  greater,  suggest¬ 
ing  a  transient  junction  breakdown  or  similar  phenom¬ 
enon  generating  large  numbers  of  hot  carriers.  The 
effect  was  not  seen  when  the  overlap  was  increased  to 
1  fjs,  however.  Additional  experiments  are  planned  in 
which  overlaps,  risetimes  and  pulse  widths  will  be  fur¬ 
ther  varied. 


P  5.  Conclusions 

We  have  described  a  model  of  a  neural  network  which 
is  based  upon  the  known  anatomy  and  physiology  of 
the  olfactory  bulb  and  piriform  cortex  of  olfactory 
mammals  [44-48],  This  model  includes  the  effects  of 
learning  assumed  to  take  place  via  long-term  synaptic 
potentiation,  and  it  has  been  shown  to  be  capable  of 
performing  hierarchical  clustering  as  a  result  of  this 
unsupervised  learning.  Moreover,  network  function  is 
statistically  based  and  it  does  not  require  precise  com¬ 
ponents;  in  particular,  the  resolution  of  the  weights 
needs  only  be  3-5  bits,  and  learning  is  via  a  simple 
coactivity-based  weight  update  rule.  These  characteris¬ 
tics  suggest  the  feasibility  of  a  direct  analog  implemen¬ 
tation;  we  describe  an  ongoing  effort  toward  such  imple¬ 
mentation  in  CMOS  integrated  circuitry,  which  employs 
current-mode  designs,  single  transistor  floating-gate 
weights,  and  features  parallel  on-chip  learning.  Circuit 
:  designs  and  test  results  from  functional  blocks  on  first 
silicon  are  presented.  It  is  estimated  that  a  network  with 
upwards  of  50K  weights  and  with  submicrosecond 
i  settling  times  could  be  built  with  a  conventional  CMOS 
[double-poly  process  and  die  size. 
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